{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook will go over how to install this repo on an external server to run the training and inference.\n",
    "To begin, we'll first need to clone the repo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 136
    },
    "colab_type": "code",
    "id": "e_cuSIyezXEp",
    "outputId": "b694fb03-5c0b-4951-fe8f-13cf2a0f4c25"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cloning into 'crnn-audio-classification'...\n",
      "remote: Enumerating objects: 127, done.\u001b[K\n",
      "remote: Counting objects: 100% (127/127), done.\u001b[K\n",
      "remote: Compressing objects: 100% (87/87), done.\u001b[K\n",
      "remote: Total 127 (delta 63), reused 101 (delta 37), pack-reused 0\u001b[K\n",
      "Receiving objects: 100% (127/127), 3.55 MiB | 0 bytes/s, done.\n",
      "Resolving deltas: 100% (63/63), done.\n"
     ]
    }
   ],
   "source": [
    "!git clone https://github.com/ksanjeevan/crnn-audio-classification.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will also need to install the [torchaudio-contrib package](https://github.com/keunwoochoi/torchaudio-contrib). This is simple as cloneing the repo and using `pip` to install it "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "colab_type": "code",
    "id": "utIhHHd83zGS",
    "outputId": "ad1d5444-3d27-4c45-8780-ab69c9f384cf"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cloning into 'torchaudio-contrib'...\n",
      "remote: Enumerating objects: 54, done.\u001b[K\n",
      "remote: Counting objects: 100% (54/54), done.\u001b[K\n",
      "remote: Compressing objects: 100% (33/33), done.\u001b[K\n",
      "remote: Total 209 (delta 22), reused 34 (delta 11), pack-reused 155\u001b[K\n",
      "Receiving objects: 100% (209/209), 1.89 MiB | 0 bytes/s, done.\n",
      "Resolving deltas: 100% (99/99), done.\n",
      "Obtaining file:///home/jupyter/crnn-audio-classification%20-%20UrbanSound8k/torchaudio-contrib\n",
      "Requirement already satisfied: torch in /opt/anaconda3/lib/python3.7/site-packages (from torchaudio-contrib==0.1) (1.0.1.post2)\n",
      "Installing collected packages: torchaudio-contrib\n",
      "  Running setup.py develop for torchaudio-contrib\n",
      "Successfully installed torchaudio-contrib\n"
     ]
    }
   ],
   "source": [
    "!git clone https://github.com/keunwoochoi/torchaudio-contrib\n",
    "!pip install -e torchaudio-contrib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll do some cleanup and move the repo into the root folder, which is optional"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "3IoVML3kzYmV"
   },
   "outputs": [],
   "source": [
    "!cp -r crnn-audio-classification/* ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll install the other required packages. Note tensorboardX is optional. If you want to install tensorboardX, you'll also need to install Tensorflow as well"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 411
    },
    "colab_type": "code",
    "id": "fVx3QyvN2b4s",
    "outputId": "f2b3e674-d72f-474f-9f0f-41047c57a68e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: SoundFile in /opt/anaconda3/lib/python3.7/site-packages (0.10.2)\n",
      "Requirement already satisfied: cffi>=1.0 in /opt/anaconda3/lib/python3.7/site-packages (from SoundFile) (1.11.5)\n",
      "Requirement already satisfied: pycparser in /opt/anaconda3/lib/python3.7/site-packages (from cffi>=1.0->SoundFile) (2.19)\n",
      "Collecting git+https://github.com/ksanjeevan/torchparse.git\n",
      "  Cloning https://github.com/ksanjeevan/torchparse.git to /tmp/pip-req-build-cxyj2nui\n",
      "Requirement already satisfied (use --upgrade to upgrade): torchparse==0.1 from git+https://github.com/ksanjeevan/torchparse.git in /opt/anaconda3/lib/python3.7/site-packages\n",
      "Requirement already satisfied: numpy>=1.16.2 in /opt/anaconda3/lib/python3.7/site-packages (from torchparse==0.1) (1.16.3)\n",
      "Requirement already satisfied: torch>=1.0 in /opt/anaconda3/lib/python3.7/site-packages (from torchparse==0.1) (1.0.1.post2)\n",
      "Building wheels for collected packages: torchparse\n",
      "  Running setup.py bdist_wheel for torchparse ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-pndy0yu8/wheels/53/64/f2/c60bf851fcf5d5363538889a115dd68d728474bbc22ef7d280\n",
      "Successfully built torchparse\n",
      "Requirement already satisfied: tensorboardX in /opt/anaconda3/lib/python3.7/site-packages (1.6)\n",
      "Requirement already satisfied: six in /opt/anaconda3/lib/python3.7/site-packages (from tensorboardX) (1.12.0)\n",
      "Requirement already satisfied: protobuf>=3.2.0 in /opt/anaconda3/lib/python3.7/site-packages (from tensorboardX) (3.7.1)\n",
      "Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.7/site-packages (from tensorboardX) (1.16.3)\n",
      "Requirement already satisfied: setuptools in /opt/anaconda3/lib/python3.7/site-packages (from protobuf>=3.2.0->tensorboardX) (40.6.3)\n",
      "Collecting pysoundfile\n",
      "  Downloading https://files.pythonhosted.org/packages/2a/b3/0b871e5fd31b9a8e54b4ee359384e705a1ca1e2870706d2f081dc7cc1693/PySoundFile-0.9.0.post1-py2.py3-none-any.whl\n",
      "Requirement already satisfied: cffi>=0.6 in /opt/anaconda3/lib/python3.7/site-packages (from pysoundfile) (1.11.5)\n",
      "Requirement already satisfied: pycparser in /opt/anaconda3/lib/python3.7/site-packages (from cffi>=0.6->pysoundfile) (2.19)\n",
      "Installing collected packages: pysoundfile\n",
      "Successfully installed pysoundfile-0.9.0.post1\n"
     ]
    }
   ],
   "source": [
    "!pip install SoundFile\n",
    "!pip install git+https://github.com/ksanjeevan/torchparse.git"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#optional\n",
    "!pip install tensorboardX\n",
    "!pip install tensorboard\n",
    "!pip install tensorflow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Downloading the UrbanSound8k dataset\n",
    "To download the dataset, navigation to the [UrbanSoundDataSet webpage](https://urbansounddataset.weebly.com/urbansound8k.html) and navigate to the bottom of the page. \n",
    "There will be a simple [form](https://urbansounddataset.weebly.com/download-urbansound8k.html) that you will need to fill out before getting access to the dataset.\n",
    "\n",
    "Once its filled out, you will receive a url to download the dataset. Copy that link, and paste into the `{urbansound8k link}`. using `wget`, we'll download the url to the notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget {urbansound8k link}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once its downloaded, we will then need to untar the file. Just replace `{urbansound8k_downloaded_file}` with the name of the file. \n",
    "Optionally, we can removed the downloaded file here as well"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 289
    },
    "colab_type": "code",
    "id": "mDILa9uqDOrV",
    "outputId": "b5961199-dead-4e2f-f1c5-b91acb914015"
   },
   "outputs": [],
   "source": [
    "!tar -zxvf {urbansound8k_downloaded_file}\n",
    "!rm -f {urbansound8k_downloaded_file}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing the config file\n",
    "\n",
    "The config file is used to build out the training model. The only thing you will *need* to change is the path to the dataset.\n",
    "which is located in `data[\"path\"]`.\n",
    "You may also want to change the number of epochs(`data[\"train\"][\"epochs\"]`), when testing. Running the training with 10 epochs took about 10 minutes on a GPU. (You are running this on a GPU right :))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "7rJflAUKC4kU"
   },
   "outputs": [],
   "source": [
    "json_config = {\n",
    "    \"name\"          :   \"Urban Testing\",\n",
    "    \"data\"          :   {\n",
    "                            \"type\"      :   \"CSVDataManager\",\n",
    "                            \"path\"      :   \"UrbanSound8K\",\n",
    "                            \"format\"    :   \"audio\",\n",
    "                            \"loader\"    :   {\n",
    "                                                \"shuffle\"       : True,\n",
    "                                                \"batch_size\"    : 24,\n",
    "                                                \"num_workers\"   : 4,\n",
    "                                                \"drop_last\"     : True\n",
    "                                            },\n",
    "                            \"splits\"    :   {\n",
    "                                                \"train\" : [1,2,3,4,5,6,7,8,9], \n",
    "                                                \"val\"   : [10]                                            \n",
    "                                            }\n",
    "                        },\n",
    "    \"transforms\"    :   {\n",
    "                            \"type\"      :   \"AudioTransforms\",\n",
    "                            \"args\"      :   {\n",
    "                                                \"channels\"       : \"avg\",\n",
    "                                                \"noise\"    : [0.3, 0.001],\n",
    "                                                \"crop\"     : [0.4, 0.25]\n",
    "                                            }\n",
    "                        },\n",
    "    \"optimizer\"     :   {\n",
    "                            \"type\"      :   \"Adam\",\n",
    "                            \"args\"      :   {\n",
    "                                                \"lr\"            : 0.002,\n",
    "                                                \"weight_decay\"  : 0.01,\n",
    "                                                \"amsgrad\"       : True\n",
    "                                            }\n",
    "                        },\n",
    "    \"lr_scheduler\"   :   {\n",
    "                            \"type\"      :   \"StepLR\",\n",
    "                            \"args\"      :   {\n",
    "                                                \"step_size\" : 10,\n",
    "                                                \"gamma\"     : 0.5\n",
    "                                            }\n",
    "                        },\n",
    "    \"model\"         :   {\n",
    "                            \"type\"      :   \"AudioCRNN\"\n",
    "                        },\n",
    "    \"train\"         :   {\n",
    "                            \"loss\"      :   \"nll_loss\",\n",
    "                            \"epochs\"    :   100,\n",
    "                            \"save_dir\"  :   \"saved_cv/\",\n",
    "                            \"save_p\"    :   1,\n",
    "                            \"verbosity\" :   2,\n",
    "                            \n",
    "                            \"monitor\"   :   \"min val_loss\",\n",
    "                            \"early_stop\":   8,\n",
    "                            \"tbX\"       :   True\n",
    "                        },\n",
    "    \"metrics\"       :   \"classification_metrics\"\n",
    "\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll write this json out, in order for the model to read in this updated json file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "9Y43AjFuF9Yf"
   },
   "outputs": [],
   "source": [
    "import json\n",
    "with open('my-config.json', 'w') as json_file:  \n",
    "  json.dump(json_config, json_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "Finally, we can start training the model. We'll be passing 3 parameters, with the first parameter being the action we want to take, which is `train`. You can use `train` to train the model, or `eval`, to perform evalation on the model. The `-c` parameter is the config file, which we just created, and `--cfg`, which is the layer configuration of the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 272
    },
    "colab_type": "code",
    "id": "GZ5AVLf5zjv6",
    "outputId": "8605180e-767a-4323-8b1d-49c771e946ca"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Compose(\n",
      "    ProcessChannels(mode=avg)\n",
      "    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)\n",
      "    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)\n",
      "    ToTensorAudio()\n",
      ")\n",
      "AudioCRNN(\n",
      "  (spec): MelspectrogramStretch(num_bands=128, fft_len=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])\n",
      "  (net): ModuleDict(\n",
      "    (convs): Sequential(\n",
      "      (conv2d_0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])\n",
      "      (batchnorm2d_0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
      "      (elu_0): ELU(alpha=1.0)\n",
      "      (maxpool2d_0): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)\n",
      "      (dropout_0): Dropout(p=0.1)\n",
      "      (conv2d_1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])\n",
      "      (batchnorm2d_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
      "      (elu_1): ELU(alpha=1.0)\n",
      "      (maxpool2d_1): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)\n",
      "      (dropout_1): Dropout(p=0.1)\n",
      "      (conv2d_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])\n",
      "      (batchnorm2d_2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
      "      (elu_2): ELU(alpha=1.0)\n",
      "      (maxpool2d_2): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)\n",
      "      (dropout_2): Dropout(p=0.1)\n",
      "    )\n",
      "    (recur): LSTM(128, 64, num_layers=2, bidirectional=True)\n",
      "    (dense): Sequential(\n",
      "      (dropout_3): Dropout(p=0.3)\n",
      "      (batchnorm1d_0): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
      "      (linear_0): Linear(in_features=128, out_features=10, bias=True)\n",
      "    )\n",
      "  )\n",
      ")\n",
      "Trainable parameters: 256266\n",
      "^C\n",
      "Traceback (most recent call last):\n",
      "  File \"run.py\", line 179, in <module>\n",
      "    train_main(config, args.resume)\n",
      "  File \"run.py\", line 117, in train_main\n",
      "    train_logger=train_logger)\n",
      "  File \"/home/jupyter/crnn-audio-classification - UrbanSound8k/train/trainer.py\", line 19, in __init__\n",
      "    super(Trainer, self).__init__(model, loss, metrics, optimizer, resume, config, train_logger)\n",
      "  File \"/home/jupyter/crnn-audio-classification - UrbanSound8k/train/base_trainer.py\", line 21, in __init__\n",
      "    self.model = model.to(self.device)\n",
      "  File \"/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py\", line 381, in to\n",
      "    return self._apply(convert)\n",
      "  File \"/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py\", line 187, in _apply\n",
      "    module._apply(fn)\n",
      "  File \"/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py\", line 187, in _apply\n",
      "    module._apply(fn)\n",
      "  File \"/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py\", line 117, in _apply\n",
      "    self.flatten_parameters()\n",
      "  File \"/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py\", line 113, in flatten_parameters\n",
      "    self.batch_first, bool(self.bidirectional))\n",
      "KeyboardInterrupt\n"
     ]
    }
   ],
   "source": [
    "!python run.py train -c my-config.json --cfg crnn.cfg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inference\n",
    "\n",
    "After we have trainined the model, we can run inference on it.\n",
    "Call the `run.py` with 2 parameters. The first is a path to a sample audio audio file. For this example, we'll use a random audio sample from the UrbanSound8K dataset. The second parameter will be the path to the model checkpoint. It will look something like this `saved_cv/{timestamp}/checkoints/model_best.pth`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "9H0Naj2rzq4K",
    "outputId": "a63019bc-67d5-4fd9-d8df-7d8212f7acad"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dog_bark 0.9858338236808777\n"
     ]
    }
   ],
   "source": [
    "!python run.py UrbanSound8K/audio/fold10/100795-3-0-0.wav -r saved_cv/0515_171217/checkpoints/model_best.pth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When running our inference, we got a 98% confidence of the supplied audio to be a dog bark."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "crnn-audio-classification - UrbanSound8k.ipynb",
   "provenance": [],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
